Fast Lookups for In-Memory Column Stores: Group-Key Indices, Lookup and Maintenance

نویسندگان

  • Martin Faust
  • David Schwalb
  • Jens Krüger
  • Hasso Plattner
چکیده

In-memory column-oriented databases have become a major topic of interest in academia and commercial applications. The demand for analytics on up-to-the-minute data and the availability of systems with hundreds of gigabytes of main memory led to the proposal of combined systems, which provide a single database for operational processing and adhoc analytical queries on current data. Recent research has identified In-Memory Column-Stores as a possible database architecture to meet these requirements. They are claimed to be capable of delivering the analytical insights while providing sufficient transactional performance. Data therein is typically split up into a write-optimized partition, which gains speed from its small size and tree-structured indices, and a larger read-only partition. To enable fast transactional and analytical performance, an index on the large, read-only partition is advisable in many cases. In this paper we present an index structure for the read-only partition, describe its advantage over the column scan and present an algorithm for the maintenance of the index. The index drastically reduces the memory traffic during query execution, leading to faster lookups and joins, thereby providing benefits to transactional and analytical processing. We analyze the memory traffic of index lookups in comparison with full column scans and the maintenance of the index structure. We develop formulas to determine the viability of an index lookup over a column scan at query runtime. While other research claimed that an index for in-memory systems should just be rebuild after every bulk-load, we show that a substantial performance increase can be achieved by reusing the former index to create an updated index. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. This article was presented at: The Third International Workshop on Accelerating Data Management Systems using Modern Processor and Storage Architectures (ADMS’12). Copyright 2012.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Group-Keys - Space-Efficient Indexing of Multiple Columns for Compressed In-Memory Column Stores

Real world applications make heavy use of composite keys to reference entities. Indices over multiple columns are therefore mandatory to achieve response time goals of applications. We describe and evaluate the Composite Group-Key Index for fast tuple retrieval via composite keys from the compressed partition of in-memory column-stores with a main/delta architecture. Composite Group-Keys work d...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing

Hash tables are important data structures that lie at the heart of important applications such as key-value stores and relational databases. Typically bucketized cuckoo hash tables (BCHTs) are used because they provide highthroughput lookups and load factors that exceed 95%. Unfortunately, this performance comes at the cost of reduced memory access efficiency. Positive lookups (key is in the ta...

متن کامل

ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory

Storage deduplication has received recent interest in the research community. In scenarios where the backup process has to complete within short time windows, inline deduplication can help to achieve higher backup throughput. In such systems, the method of identifying duplicate data, using disk-based indexes on chunk hashes, can create throughput bottlenecks due to disk I/Os involved in index l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012